-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add command to update with GFA #115
base: main
Are you sure you want to change the base?
Conversation
c0b411b
to
8d869e8
Compare
@@ -158,8 +158,8 @@ pub struct Link<T: SampleType, S: Opt, U: Opt> { | |||
#[derive(Debug, Clone, Ord, Eq, PartialOrd, PartialEq)] | |||
pub struct Path<T: SampleType, S: Opt, U: Opt> { | |||
pub name: String, | |||
pub dir: Vec<bool>, | |||
pub nodes: Vec<T>, | |||
pub strands: Vec<bool>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Felt like renaming some of these fields to be clearer
src/updates/gfa.rs
Outdated
let mut path_strands_by_name = HashMap::new(); | ||
for path in &gfa.paths { | ||
let path_name = path.name.clone(); | ||
path_segments_by_name.insert(path_name.clone(), path.segments.clone()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to clone these or can we pass a reference? seems like it can be a problem w/ bigger GFAs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, changed to reference
src/updates/gfa.rs
Outdated
let path_name = &path.name; | ||
if matched_path_names.contains(path_name.as_str()) { | ||
for segment_id in path.segments.iter() { | ||
matched_path_name_by_segment_id.insert(segment_id, path_name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couldn't a segment be in multiple paths? Should this be a set of paths the segment belongs to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, looking below a path segment id can be overwritten when shared and will flag it as an unmatched path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, yep, fixed
src/updates/gfa.rs
Outdated
let mut previous_node_strand = Strand::Forward; | ||
let mut new_path_edges = vec![]; | ||
for (i, segment_id) in unmatched_path_segment_ids.iter().enumerate() { | ||
if existing_path_ranges_by_segment_id.contains_key(segment_id) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think you can do something like if let Some(start, end) = existing_path_ranges_by_segment_id.get(segment_id)
to avoid the double lookup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, done
src/updates/gfa.rs
Outdated
.sequence_type("DNA") | ||
.sequence(segment_sequence) | ||
.save(conn); | ||
let node_id = Node::create(conn, &sequence.hash, None); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should make a node hash so this can be repeated without infinite updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, but let me know if the hash string could be better
src/updates/gfa.rs
Outdated
); | ||
|
||
let temp_dir = tempdir().unwrap(); | ||
let gfa_diff_path = temp_dir.path().join("parent-child-diff.gfa"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we save this as a general fixture to have a test where we just have the initial simple.fa and no other intermediate assets the alternative paths could maybe be coming from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool, i gues that is the walk-diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, did this for the path diff too
8aa9943
to
69b9887
Compare
69b9887
to
f5160a8
Compare
src/updates/gfa.rs
Outdated
let node_id = Node::create( | ||
conn, | ||
&sequence.hash, | ||
calculate_hash(&format!("{segment_id}_{hash}", hash = &sequence.hash)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we might want to prefix with the path id as well, so we don't have node reuse if this segment is in multiple paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a chicken-and-egg problem because we don't have the path ID yet, we need the nodes + edges between them in order to create the path. I added the path name here instead, that OK?
No description provided.